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Abstract 

Administrative register data are increasingly important in statistics, but, 
like other types of data, may contain measurement errors. To prevent such 
errors from invalidating analyses of scientific interest, it is therefore essential to 
estimate the extent of measurement errors in administrative data. Currently, 
however, most approaches to evaluate such errors involve either prohibitively 
expensive audits or comparison with a survey that is assumed perfect. 

We introduce the “generalized multitrait-multimethod” (GMTMM) model, 
which can be seen as a general framework for evaluating the quality of admin- 
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Organization for Scientific Research (NWO) [Veni grant number 451-14-017]. 
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istrative and survey data simultaneously. This framework allows both survey 
and register to contain random and systematic measurement errors. Moreover, 
it accommodates common features of administrative data such as discreteness, 
nonlinearity, and nonnormality, improving similar existing models. The use of 
the GMTMM model is demonstrated by application to linked survey-register 
data from the German Federal Employment Agency on income from and dura¬ 
tion of employment, and a simulation study evaluates the estimates obtained. 
KEY WORDS: Measurement error. Latent Variable Models, Official statistics. 
Register data. Reliability 


1. INTRODUCTION 

Register data and administrative records play an increasingly important role in statis¬ 


tics (Wallgren and Wallgren, 2007), and several anthors recommend and predict the 


increased nse of “big data” (Entwisle and Elias, 2013 Podesta, 2014), inclnding ad¬ 


ministrative register data (Japec et ah, 2015). Uses to date inclnde stndies of how 


agricnltnral honseholds affect land changes (Rindfuss et ah, 2004), voter tnrnont 


(Ansolabehere and Hersh, 2012), or how peoples’ nnmerical ability relates to mort¬ 


gage defanlt (Gerardi et ah, 2013). However, there is evidence that register data may 


contain considerable measnrement errors (Groen, 2012). For example, Bakker (2012 


p. 15) estimated that 24% of the variance in Dntch official honrly wages records was 


random measurement error; Ansolabehere and Hersh (2010, p. 1) reported that 
16.1 million out of the 185.4 million listed voter registration records in the United 


States were invalid; and Ladouceur et al. (2007, p. 275) suggested that 20% to 30% 
of osteoarthritis cases are not registered in Quebec hospital administrative records, 
causing bias in prevalence estimates. The measurement error present in administra¬ 


tive records can severely bias and invalidate research results (Garroll et ah, 2006 


Saris and Gallhofer, 2007 Vermunt, 2010). It is therefore essential to evaluate the 
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extent of measurement error in register dataQ 

The difficulty in studying error in register and administative data, however, is 
that there is often no “gold standard” measure. Some authors have suggested to link 
administrative registers to a survey, assuming the survey contains no measurement 


error (e.g. Yucel and Zaslavsky, 2005). But measurement error in survey data is 


widespread ( 

Hansen et al. 

1961 

1964 

Felligi 

l')(i4 

Andrews, 

l'ts4 

Alwin 

2007 

Saris and Gallhofer 

2007 

Biemer 

2011 

, and is in fact often measured by taking ad- 


ministrative records as the “gold standard” (e.g. Kapteyn and Ypma, 2007; Kreuter 


et al., 2010 Sakshaug et ah, 2010 Kim and Tamborini, 2014). Thus, we often have 


two data sources, both measured with error, and we are interested in estimating the 
error in both. 

Very few studies have attempted to estimate measurement error in both survey 


and administrative data simultaneously. Nordberg et al. (2004) discussed a long! 


tudinal latent Markov model of measurement error in income, but again assumed 


the administrative register to be perfect in cross-sectional data; Pavlopoulos and 


Vermunt (2013) applied a similar latent Markov model to unemployment data; and 


Bakker (2012) and Scholtus et al. (2015) estimated measurement error using linear 


factor analysis. However, the models used in these studies have several drawbacks 
when applied to administrative register data. First, true values of the variables of 
interest are often censored, zero-inflated, gamma, count, or nominal, and thus mod¬ 
els which assume normally distributed true values are not appropriate. For example, 
income is usually zero-inflated and occupation is nominal. Second, the measurement 
error process in registers is likely to lead to nonnormal and nonlinear errors, yet many 
models used to study measurement error assume linear and homoskedastic errors. 


For example, top-coding of income causes nonlinear method effects (Gottschalk and 


^We use the terms “register data” and “administrative data” synonomously to avoid repetition. 











































































































Huynh, 2010), and it is often thought that low earners over-report while high earn¬ 


ers under-report, yielding “mean-reverting” random errors (e.g. Kim and Tamborini 


2014). Third, the measurement quality of administrative data often differs over ob¬ 


servations, yielding a mixture of measurement models. For example, the records 


may be obtained from a mixture of sources (Wallgren and Wallgren, 2007), such as 


both employer statements and employee self-reports, or the variable may be more 
ambiguously dehned for some cases than for others: the income of day laborers is an 
example. Earlier approaches have not accounted for such heterogeneity. Currently, 
then, there is no generally applicable method to evaluate the extent of measurement 
error in register and survey data. 

Our contributions to the literature are twofold: we present a framework for simul¬ 
taneously estimating measurement error in register and survey data which addresses 
the shortcomings of earlier methods; we then provide guidance on the circumstances 
in which survey data or register data are preferable for use in research. Section 
introduces the modeling framework used to estimate the extent of measurement error 
in survey and register data simultaneously, and demonstrates how this framework 
encompasses existing methods. Section applies the model to linked survey-register 
data on income and duration of employment from the German Federal Employment 
agency, while a simulation study in Section evaluates the estimates obtained. 


2. MEASUREMENT ERROR ESTIMATION FROM MULTIPLE 

ERROR-PRONE SOURCES 

Our technique for simultaneously estimating measurement error in survey and admin¬ 


istrative data builds on the “multitrait-multimethod” (MTMM) approach (Campbell 


and Fiske, 1959). Given a set of variables of interest (“traits”) for which observed 


measurements exist in both the administrative data and a sample survey, our goal is 
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to estimate the degree of measurement error in variables observed in both sources. 

Let Utm denote an observed random variable measuring the t-th trait using the 
m-th method. In the application described here, m will denote either administrative 
or the survey measurement. 

Example 1 . Suppose true income from full-time jobs 771, part-time jobs 772, and other 
types of jobs 773 are of interest, for instance for future study of their relationship with 
educational attainment. Corresponding error-prone observed measures t / h , 7/21, and 
7/31 are obtained in an administrative register. For a random subsample of cases, we 
also have survey measures of the same variables: 7/12, 7/22, and 7/32. There are thus 
three “traits” (full-time, part-time, and other income) and two “methods” (register 
and survey), and six observed variables. An equivalent view is that ytm results from 
a repeated measures design in which the factors “trait” and “method” have been 
fully crossed. 

2.1 Current approaches to modeling MTMM data 
Commonly, MTMM data are analyzed using the linear model 

ytm '^tm T ^tmVt T 'Jim^m T ^tmi (1) 

where Ttm is the constant systematic bias in ytm and Xtm and 'ytm are constant scaling 
factors with respect to the random variables. The “trait factor” rjrn is a random 
subject X trait interaction, symbolizing the “true value” of the trait measured by 
ytm- The “method factor” ^t is a random subject x method interaction, symbolizing 
method bias that differs over subjects but is common to variables measured with the 
same method. The residual etm is random measurement error. 

Assuming all rjt, im and etm follow a multivariate Gaussian distribution. Model [T] is 
a conhrmatory factor analysis (CFA) model with parameter vector 6 := {t' , X', cr'^Y, 
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where the parameters have been collected into vectors and cTx denotes the nonre- 
dundant elements of the covariance matrix of x, stacked columnwise. 

Under this model the implied product-moment correlation between two observed 
variables ytm and ytim' (for t ^ t') is 


COTijJtmi yt'm'^ 


KmKm'COT{rit, rjv) if m ^ m' 

lit') + Otherwise, 


where = coi{ytm,ilt) is the “reliability coefficient” of 

ytra and 7 ^^ = = coi{ytm,im) is the “method effect”. Thus, 

when the measures have been obtained by different methods, the correlation between 
two observed error-prone variables is attenuated by a factor XlrnK'm' relative to the 


correlation between the “true scores” rjt and ry''- a classical result (e.g. Lord and 


Novick, 1968; Fuller, 1987). This result shows that it is essential to model both 


random measurement error etm and individual method biases their presence will 
have dramatically different effects on subsequent analyses of interest. The MTMM 
design allows for the separation of these two error factors. 

This approach has led to a large literature on MTMM modeling using CFA (struc¬ 
tural equation modeling) to estimate the degree of random and systematic measure¬ 


ment error in survey data (e.g. Alwin, 1973 Andrews, 1984 Saris and Andrews 


1991 Saris and Gallhofer, 2007; Bakker, 2012). Extension for ordinal categorical 


data using the “ordinal factor analysis” model (Muthen, 1983) have also been ap¬ 


plied (Oberski et ah , 2008). Recently, Oberski (2013) introduced a latent class factor 


(Vermunt and Magidson, 2004) MTMM model. 


The MTMM framework is in principle attractive for the modeling of measure¬ 
ment errors in administrative and survey data. For register data, however, these 
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currently available MTMM models are inadequate and can yield biased or nonsensi¬ 
cal estimates, for the three reasons given in Section [T] nonnormality of true values, 
nonlinearity and heteroskedasticity of errors, and the existence of unknown groups 
that exhibit differential measurement error. We generalize the MTMM framework 
to allow for these possibilities. 


2.2 The generalized multitrait-multimethod model 


We use generalized latent variable models (Skrondal and Rabe-Hesketh, 2004) to 
formulate a measurement model for MTMM data from an administrative register 
and a survey that can account for non-classical error processes, nonnormal distri¬ 
butions, and categorical data. Generalized latent variable models are built up from 
(i.) linear GLM predictors; (ii.) GLM links and exponential family distributions; 
and (iii.) conditional independence relations. The conditional independence rela¬ 
tions we use result from the MTMM design and are common to all MTMM models, 
whereas the choice of links and distributions is flexible: for this reason we call our 
approach a “generalized multitrait-multimethod” (GMTMM) model. The flexibility 
in links allows us to model nonlinearities and heteroskedasticities in the error pro¬ 
cess, while the choice of distributions for the latent variables allows for nonnormality 
of the true values. Finally, when heterogeneous measurement error processes need 
to be accounted for, a hnite mixture is used that allows the parameters of the linear 
predictors to differ over the mixture components. 


(i.) Linear predictors. For continuous observed data, linear predictors for the 
observed variables ytm are: 


r'im — Ttm T T 


( 2 ) 
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where, for identification purposes, the first loading of each trait factor rjt and method 
factor ^rn is often set to unity, Ati = 71 m = 1. For categorical observed data, linear 
predictors for category ytm = k are 


^ktm T~ktm "h ^ktmVt d" '^km^mi 


( 3 ) 


where the hrst category can be chosen as a reference by setting Tum = = 0 

(e.g. Vermunt and Magidson, 2013). 

The above linear predictors are common to all population units, and therefore 
assume that the measurement process is homogeneous. When the error process is 
thought to be heterogeneous, the linear predictor parameters are allowed differ over 
the mixture components, yielding an additional subscript vtm,s or (for categorical 
data) T^ktm,s- 


Vermunt and Magidson, 2013 


(ii.) Links and distribntions. Each of the observed and latent variables is 
assigned a distributional “family” and a link function g{-) connecting the linear 
predictor to the expectation of the response ytm is chosen. 


^tmi Or ^(E[|//jiml^t, ^m]) ^ktmi (4) 


depending on whether the observed variable is continuous or categorical. 

We denote the choice of the conditional distribution of the observed responses 
given the latent variables as fy := p{ytm\'nti im) with parameter vector 6y. Similary, 
the multivariate distribution of the latent “true score” variables is denoted fy with 
parameters 9y and the distribution of the latent “method” variables /g with param¬ 
eters 9^. Depending on whether the variables to which they refer are continuous or 
categorical, fy, f^ and fy may be probability density or probability mass functions. 





Finally, the finite mixture components are assigned a multinomial distribution. 

(iii.) Conditional independencies. The specification of the generalized latent 
variable model is completed with assumptions of conditional independence that are 
necessary for identification of the model parameters from observables. These as¬ 
sumptions mirror those of the linear MTMM model. 

Assumption 1. The observed variable ytm is conditionally independent of all other 
observed variables given its trait factor rjt and method factor 

Assumption [T] implies that the joint conditional distribution of observed given 
latent variables can be factored into the univariate conditional distributions, i.e. 

p(y 1^, = n fyivt^lvu im, Oy). (5) 

t,m 

Assumption 2. The latent method factors ^ are mutually independent and indepen¬ 
dent of the trait variables t]. 

Assumption implies that the latent variable joint distribution can be factored 
into 

( 6 ) 

m 

Note that there may still be dependencies among the latent trait variables in the 
vector t]. 

Example 1 (continued). The conditional independencies can be displayed in 
a graph with directed arrows for GLM regressions and undirected edges denoting 
possible (conditional) dependence. Figure shows the GMTMM model for the six- 
variable MTMM data from Example In the Figure, observed variables ytm are 
shown as rectangles while unobserved random variables (factors) are shown as el- 
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True values 


Observed values 


Random errors 


Method factors 



Figure 1: A generalized multitrait-multimethod (GMTMM) model for three “traits” 
using administrative data and a survey as measurement “methods”. The example 
traits signify personal income from full-time, part-time, and other kinds of employ¬ 
ment over a certain period. 


lipses. Assumption can be verified by noting that conditioning on the hidden 


nodes yields an independence graph (e.g. Lauritzen, 1996). 


Likelihood. When the error process is thought to be homogeneous, the marginal 
likelihood p{y\0) is 


p{y\e) 


n n Cm, Oy) 

m t^m 


dyd^. 


( 7 ) 


where assumptions [T] and are used and the integral is dehned as a sum for discrete 
latent variable distributions. 

For heterogeneous error processes, in which a mixture of error processes is thought 
to be present, dehne p{y\S,6s) as the component-specihc marginal likelihood, with 
component specihc parameters Og. Typically, it is the measurement parameters that 
are thought to differ over components: that is, the linear predictors We then 

introduce an unobserved discrete variable S with categories equal to the number of 
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components, so that the marginal likelihood of the observed data becomes 


s 

Since the mixture proportions p{S) are typically unknown, this implies an additional 
[S'! parameters in 6 to be estimated. 


2.3 Special cases of the GMTMM model 

By choosing different link functions, distributions, and error structures, a range of 
models that has been introduced in the literature to estimate measurement error 
in MTMM designs and administrative register data result as special cases of the 
GMTMM model. 


Example 2. A common choice is to assume homogeneous errors, the identity link 
function g{x) = x, and distributions fy = cretm)^ with Gaussian latent variables 


fr, = MVN[0, S(0j,)], /g = N(0,(T^^), leaving '^{Oy) unrestricted so that 6y = 


V 


This is the linear conhrmatory factor analysis MTMM model presented above. This 


model was applied to linked survey-register data by Bakker (2012) and Scholtus and 


Bakker (2013). 


Example 3. Leaving and fy unchanged from Example]^ the probit factor model 
for binary data results from choosing fy = Binomial[E(|/jm)] with g = 
where <h is the standard normal distribution function. If, instead, the link function 
g = logitis chosen, a “two-parameter logistic” item response theory MTMM 
model is obtained. 

Ordered categorical data can be modeled by choosing fy = Multinomial[E(?/im = 
k)], redehning the observables, and choosing the link function 


g[PY{ytm < k\gt,^m)] = d) 
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where the loadings are set equal over categories, Xktm = ^tm, Iktm = 'ytm, and the 
category-specihc intercept —Tktm plays the role of a cumulative probit “threshold” 


(Rabe-Hesketh et ah, 2004). An ordered probit relationship between ytm and the 


latent variables is thus specihed. This model is known as the “ordinal factor anal¬ 


ysis” model in the structural equation modeling literature (Muthen, 1983) and is a 


multidimensional version of the “normal ogive graded response model” in the item 


response theory literature (Samejima, 1969). 


Example 4. The CFA and categorical CFA models in Examples and relied on 
normally distributed latent variables. It is possible to relax this assumption of nor¬ 
mally distributed latent variables by specifying = Multinomial(7rj,) with free joint 
probability vector = tt,,, and univariate distributions = Multinomial(7r^^), 
with free univariate probability vectors 0^ = The number of latent cate¬ 

gories to which fri and refer must be chosen in advance, yielding a hnite mixture 
or “latent class” MTMM model (Oberski] 2013). When accompanied by the choice 


fy = Multinomial, this model was described as “nonparametric” by Skrondal and 


Rabe-Hesketh 

(2(1(14 

, sec. 4.4.2) and as “semiparametric” by 

Heinen 

(1996 


2.4 Estimation and identihcation of GMTMM model 

The parameters 6 can be estimated from linked survey-register data when there 
are at least three “traits”-that is, variables of interest that have been measured with 
error in both survey and administrative register. Standard estimation procedures for 
generalized latent variable models can be used to estimate the GMTMM model (e.g. 


Skrondal and Rabe-Hesketh, 2004, chapter 6). The most general is to use standard 


optimization algorithms to maximize the marginal likelihood from Equation or 
For certain models, such as latent class MTMM models, direct maximization of 
the marginal likelihood may become unstable. An expectation-maximization (EM) 
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algorithm can then be used (McLachlan and Krishnan, 2007). 


Many of the special cases of GMTMM models, including the examples given 
above, can be estimated using standard software for latent variable modeling such 
as Latent Gold ( Vermunt and Magidson| 2013) or GLAMM (Rabe-Hesketh et al. 


2004), that implement this estimation strategy. Moreover, specialized efficient esti¬ 


mation procedures already exist for certain special cases of the GMTMM model. For 
example, the linear factor analysis MTMM model can be formulated as a covariance 


structure model with a closed-form marginal likelihood (Bollen, 1989). The ordinal 


factor analysis (cumulative probit) model can be similarly dealt with by hrst comput¬ 


ing polychoric correlation coefficients (Muthen, 1983). Such models can be £t using 


standard software for structural equation modeling. Other possible combinations of 
choices may require specialized software. 

2.5 Model identihcation 

The GMTMM model is a latent variable model, and its parameters are therefore 
not necessarily identihable. A hrst point of interest is whether a given GMTMM 
model, such as the ordinal GFA MTMM model (Example]^, will have identifiable 
parameters. A second point of interest is what number of traits and methods are 
minimally required to identify the parameters of any GMTMM model. Assessing 
identihability can be particularly relevant in advance of designing a survey to evaluate 
administrative data quality, since this will determine how many questions should be 
asked in the survey. 

We take parameters to be “identifiable” if and only if a finite number of pa¬ 
rameter values will lead to any given likelihood for all parameter values of nonzero 


measure see 


Allman et ah, 2009, for some of the subtleties involved in this dehni- 


tion). Trivially, for example, with only one variable observed on one trait using a 
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single method, it is clearly not possible to establish the parameter values regarding 
the latent trait and latent method factor variables separately, since inhnitely many 
choices of 9 will lead to the same likelihood. On the other hand, the well-known 


■‘label switching” phenomenon in latent class-type models (McLachlan and Peel 


2000) leads to hnitely many solutions and is therefore not considered an identihca- 


tion problem here. Similarly, choices of 0 that lead to rank dehciencies but have a 


point mass in the parameter space (see for example Shapiro and Browne, 1983) are 
not considered identihcation problems in this dehnition. 

First, under the dehnition given, a given GMTMM model’s parameters will be 
identihable if and only if the Jacobian dp{y\0)/d0 is of full column rank almost 


everywhere (Catchpole and Morgan, 1997, Theorem 1). Equivalently, the ra nk of 
the information matrix may be examined. For GMTMM models with a closed-form 
marginal likelihood, this condition can be established analytically by assessing this 
rank using a symbolic algebra program. This may be considered an inconvenience 
by many applied researchers, however. For models without a closed-form marginal 
likelihood, analytical proofs are even more difficult. Numerical methods are then the 
more convenient tool to assess identihability. 

A common numerical approach is to examine the rank of the information matrix 
at the maximum likelihood estimate for a given dataset using the same software used 
to ht the model. The disadvantage of this method is that it conditions on the data at 
hand. For example, a model may appear identihed when it is not, due to boundary 
solutions, and it may appear non-identihed for particular parameter values when it 


is identihed in the larger parameter space. To overcome this disadvantage, Forcina 


(2008) suggested evaluating the rank of the Jacobian at a large number of random 
values in the parameter space. This method has been implemented in the software 


Latent Gold 5 (Vermunt and Magidson, 2013). 
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This Section introduced a generalized multitrait-multimethod model that can be 
used to estimate measurement error when at least two separate measures of at least 
three different phenomena are available. The GMTMM model can deal with nonnor- 
mality of true values, nonlinearity and heteroskedasticity of errors, and the existence 
of unknown groups that exhibit differential measurement error. It is therefore appli¬ 
cable to estimating measurement error in administrative register data and surveys 
simultaneously. It is also more generally applicable to situations where such error 
structures are thought to exist in multiple error-prone sources. 

3. APPLICATION TO ADMINISTRATIVE DATA ON INCOME AND 

DURATION OE EMPLOYMENT 

This Section applies the GMTMM model to a unique dataset provided by the In¬ 
stitute for Employment Research {Institut fur Arbeitsmarkt- und Berufsforschung, 
lAB), the research institute of the German Federal Employment Agency {Bunde- 
sagentur fiir Arbeit, BA). The BA’s normal operations include job placement and 
payment of benehts, and for these purposes it maintains an extensive database of 
citizens’ (un)employment histories dating back to 1975. This database covers Ger¬ 
man employees who are subject to social security contributions as well as recipients 
of entitlements, comprising about 86% of the overall German labor force. Excluded 
from the register are most civil servants, the self-employed, and others who have 
never been in contact with the Agency, such as the never-employed. 

Both survey data and the BA’s register data are routinely used for labor market 
and policy research-especially those on income and duration of employment. For con¬ 
senting respondents, we gained IRB approval to link administrative record data from 
the Agency with a telephone survey conducted by the lAB (lAB Beschaftigtenhistorik 
(BEH) Version 09.01.00, Niirnberg 2012). Restricted access to the anonymized linked 
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survey-administrative data was provided at the Agency’s offices; the raw data cannot 
be made publicly available for legal reasons. 

Particularly of interest are the BA’s records on income from full-time, part-time, 
and “marginal” employment. “Marginal” employment, also known as “Minijobs”, 
is a common form of low-income employment in Germany, yielding monthly income 
of up to 400 Euro (at the time of data collection); at or below this maximum, the 
employee is exempt from income taxes and social security. Of additional policy 
interest are the durations of the last employment spell of these three employment 
types. These data are not provided by the employees themselves, but rather by their 
employers, who are legally required to report their employees’ income accurately for 
the purposes of taxes, benefits, and social security. 

However, exactly because the income and duration data were collected for the 
BA’s administrative purposes, measurement error can become a serious issue for 
research in spite of reporting accuracy, because measurement errors in administrative 


data need not come from the reporting itself (Bakker, 2009; Groen, 2012). For 
example, although the employers will presumably fulhll their mandate to report 
accurately, when compiling historical records there may be mismatches and time 
lapses in an individual’s record. Similarly, smaller jobs may simply be absent from 
the records, again leading to a mismatch in “last part-time job”, for instance. These 
issues will lead to random and correlated measurement error for research purposes. 

To obtain the survey measurement, a stratihed sample of 2,400 respondents was 
asked to provide information on income and employment duration from full-time. 


part-time, and marginal employment (see Eckman et ah, 2014, for further description 
of the sample design). The survey had a response rate (AAPOR RRl) of 19.4%. In 
the following analyses, we accounted for the sample stratihcation using complex 
sampling adjustments. Of the respondents, 2,284 (95%) provided informed consent 
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to record linkage between the survey and the administrative registers. This linkage 
could be performed using unique person identihers, so that it seems reasonable to 
assume no linkage errors were present. By linking the administrative data to the 
survey data, we thus obtained MTMM designs with three traits and two methods, 
one for each of the income and duration data. 

The register provides income data only at the level of employment spells. This 
typically corresponds to an annual basis if a respondent was employed at the same 
employer throughout a given year. The survey, however, explicitly asks for the last 
monthly income from gainful employment which is the standard reference period 
used in most German surveys. Assuming that salaries are paid evenly throughout 
the employment spell, the administrative data were converted to a monthly basis. 

3.1 Estimates of reliability and method effects in survey and administrative mea¬ 
sures 

To demonstrate the flexibility of the GMTMM approach and account for possibly 
differing measurement processes in the two measures investigated, we £t different 
types of GMTMM models to the duration and income data. 


Duration data. For the duration data, we estimate Gaussian GMTMM models: 
that is, the familiar linear structural equation model using the standard SEM soft¬ 


ware lavaan for R (Rosseel et ah, 2013; R Gore Team, 2014). The program code to 
estimate this model can be found in the Appendix. 

This approach yielded estimates for the trait loadings (Atm), method loadings 
(jtm), factor (co)variances (cr^m, o'rjt), and error variances (ctm)- In a linear model, 
the quality of each administrative variable can be simply represented by two numbers: 
the reliability and the method effect. These represent, respectively, the correlation 
between the observed administrative variable and its measured trait, and between 
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Figure 2: Reliability and method effect estimates for survey data, and reliability 
estimates for administrative register data on duration of full-time, part-time, and 
“marginal” employment. 


the observed variable and the method factor (Saris and Gallhofer, 2007). A high 


reliability indicates that a survey question or register value contains little random 
error and accurately reflects the true value it measures. A high method effect, on 
the other hand, indicates that a substantial part of the variance is due to factors 
shared with other survey or register measures, but which are independent of the true 
values. An ideal measure would therefore have reliability one and zero method effect. 
Estimates of the reliability and method effects are displayed for the duration data 
in Figure 

Figure shows reliability estimates in the left-hand panel and method effect 
estimates in the right-hand panel for the administrative and survey data on duration. 
The reliability estimates in Figure]^ are between 0.7 and 0.8 for the administrative 
data, which indicates that reliability of the administrative data is acceptable, but 
far from perfect. For example, the correlation between administrative records on 
full-time duration and the person’s true full-time duration is estimated at 0.7. The 
administrative measures’ reliabilities are clearly higher than the survey measures’ 
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reliabilities, which are around 0.6. Thus, the self-reports were somewhat less reliable 
than the administrative records, but neither measure was perfect. 


While htting the model, the method effects (7tm) and method factor variances 
(cr^m) for the administrative measures were estimated at zero but caused serious de¬ 


pendencies among the parameter estimates. We followed Eid (2000) and Saris and 


Gallhofer (2007) in hxing these to zero and re-estimating the model without method 


dependencies in the administrative data. The right-hand panel of Figure [^therefore 
shows method effects for the survey measures only. These method effects can be seen 
as small for full-time durations, medium for the part-time durations, and very large 
for durations of “marginal” jobs. For example, a standardized method effect of 0.4 
implies that answers to two survey questions on income will correlate by 0.4 above 
and beyond any true correlation between the two measures, thereby inflating rela¬ 
tionship estimates that do not account for method effects. These large dependencies 
may be related to survey respondents’ different but systematic interpretations of a 
“duration”, or of what counts as a “marginal” job. However, there does not appear 
to to be any such effect in the administrative data. 


Income data. To estimate the quality of the administrative register as well as the 
survey answers on income data, we adapt the model to recognize several aspects of 
the measurement process: 


Following the econometrics literature (Tobin, 1958), censoring in income is 
accounted for; 


The relationship between true income and reported income is thought to be 


nonlinear (Kim and Tamborini, 2014); 


Previous studies linking survey and register data (Scholtus, 2015) suggested 
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that there is a subgroup of respondents for whom the two measures correspond 
exactly, whereas for others they do not, possibly suggesting a heterogeneous 
error process; 


• There is a strong incentive to misreport one’s income from a “Minijob” as being 
equal to or below 400 euros, since at the time of the survey this was the legal 
maximum income to qualify for tax exemption and social security exemption 
(see §8 SGB [Social Security Code]). 


Due to these factors, a linear Gaussian MTMM will not suffice. Instead, we choose 
fy to be the standard censored regression equation, use the “nonparametric” latent 


class factor analysis formulation of and fy to allow for nonlinearity (Oberski 


2013), and investigate whether an additional mixture component of S in which the 


response is unrelated to the true value hts the data more closely than a homogeneous 
error structure. This model is no longer a standard structural equation model but 
can be estimated in the software for latent class (factor) analysis Latent GOLD 5.0 


(Vermunt and Magidson, 2013). Program input can be found in the Appendix. 

The latent class factor analysis model does not impose a distribution on the 
latent trait and method factors, but instead approximates these distributions by dis¬ 
crete interval-level latent variables whose category sizes are estimated from the data 


(Vermunt and Magidson, 2004). Moreover, the possibility of a heterogeneous error 
structure suggests the presence of an additional discrete nominal latent variable S. 
Since the number of categories for the latent trait, method, and error structure vari¬ 
ables is unknown, we compare the £t of models with differing numbers of categories 
for each of these. Since increasing the number of categories of the method factors 
and the error structure variables beyond two never improved the model, we only 
show these comparisons for models with differing numbers of categories K for the 
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K 


Error process 

Heterogeneous 

Homogeneous 


LL 

BIG 

AIG 

# par. 

LL 

BIG 

AIG 

# par. 

2 

-5060.0 

10413.8 

10195.9 

38 

-5388.3 

11024.0 

10840.6 

32 

3 

-4758.3 

9825.9 

9596.6 

40 

-5272.1 

10814.8 

10614.1 

35 

4 

-4848.9 

10030.3 

9783.8 

43 

-5210.1 

10714.1 

10496.3 

38 


Table 1: Fit of GMTMM models for the measurement error in administrative and 
survey data on income. 


latent trait variables {Tjt), with (|S'| =2) and without (|S'| = 1) a heterogenous error 
structure. 

Table shows the £t of these models in terms of loglikelihood (LL), BIG, and 
AIG, as well as the number of parameters these models have. The model with three 
latent categories and a heterogeneous error process £t the data best in terms of BIG 
and AIG. This result suggests that there may indeed be differing error processes for 
different respondents. Since the model £t did not improve when increasing the num¬ 
ber of latent categories from three to four, we selected the three-class heterogeneous 
model. In other words, we approximate the distribution of true latent income with 
a discrete three-category latent variable for which the category sizes are estimated. 
We also allowed for some proportion of the observations to be unrelated to the true 
value, for example because some hxed value (such as 400 euros) was always chosen 
in this group regardless of the true income. 

Table shows the expected means of the administrative and survey measures 
of log-income for different categories of the latent trait and method variables. The 
table illustrates how the observed measures are estimated by the model to relate to 
the respective latent variables. The relationships in Table are marginalized over 
the two categories of the error process latent variables S. Thus, the table shows how 
the relationship holds for a respondent whose error process is not known in advance. 
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Trait 


Method 

Overall 

1 

2 

3 

1 

2 

Administrative data 

(log-income) 




Full-time 

1.11 

2.69 

4.31 



1.85 

Part-time 

0.65 

1.54 

2.45 



1.08 

Marginal 

0.09 

0.23 

0.36 



0.21 

Survey data 

(log-income) 





Full-time 

2.20 

3.16 

4.12 

5.52 

2.25 

2.65 

Part-time 

0.91 

1.67 

2.45 

1.44 

1.26 

1.28 

Marginal 

0.27 

0.33 

0.38 

0.33 

0.32 

0.32 


Table 2: Estimated relationships between categories of the latent trait variables rj 
and the expected observation of log-income from fnll-time, part-time, and marginal 
employment nsing the administrative and snrvey measnres. 


Abont 5% (not shown in the table) are estimated to belong to the latent category 
in which a random valne is given - that is, a valne that is nnrelated to the trait or 
method variables. 

The model is no longer linear, so that reliability and method effect coefficients, 
which represent (linear) correlations are more difficnlt to interpret. However, it is 
possible to calcnlate the model-implied reliabilities coT{ytm,,Vt) and method effects 
coT^i/tm, Vm)- Thesc estimates, with confidence intervals based on bootstrapped stan¬ 
dard errors, are shown in Fignre The fignre shows that while the administrative 
data on income from fnll-time and marginal jobs are estimated to be snperior to 
the snrvey measnres, the snrvey measnre has a stronger linear correlation with trne 
income level from part-time work. A possible explanation for this difference is a 
change in mandatory reporting procednres regarding part-time employment in the 
year 2011. On the other hand, the snrvey measnres do exhibit a strong method 
dependence, whereas again the administrative register measnres were estimated to 
have no snch method dependence. 
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Figure 3: Reliability and method effect estimates for survey data, and reliability 
estimates for administrative register data on income from full-time, part-time, and 
“marginal” employment. 

In summary, we found for official administrative data obtained from the German 
Federal Employment Agency that the reliability of both survey and administrative 
data was far from perfect. Estimated relationships between these observed variables 
and other variables of scientific interest will therefore be biased. Moreover, for some 
of these measures, method effects were found that will cause spurious dependencies 
where none exist among the true variables; when using administrative data, method 
dependence may be less of a concern. To prevent biases arising from measurement 
error in substantive analyses of income or duration data, correction methods for 


known error processes may be needed (e.g. Saris and Gallhofer, 2007 Vermunt, 2010 
Skrondal and Kuha, 2012[ ). 


4. SIMULATION 

We demonstrate some key properties of the maximum likelihood estimates of GMTMM 
model parameter estimates using a simulation study. Since there are many possible 
GMTMM models that fall within this framework, we choose the model and param¬ 
eter values based on the linked survey-register dataset obtained from the German 
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Federal Employment Agency, and summarize bias and standard error accuracy under 
different conditions corresponding to sample sizes. 

The response model chosen for the observed variables is a censored regression 
in which the unobserved trait and method variables are the regressors and the de¬ 
pendent variables are six observed indicators corresponding to the crossing of three 
traits and two methods. Thus, the response model for the observed variable ytm 
measuring trait t with method m is 


Utm 


I 0, if yl^ < 0 

\y*tm^ otherwise 


where y^^^ follows the linear factor model. 


( 9 ) 


ytm — Tm T T T (^tmi 


etm ^ N{0,ae,tm)- ( 10 ) 


The latent variables themselves are discrete interval-level variables with a multi¬ 
nomial distribution parameterized using the log-linear model 


P(?7i = ki,r]2 = k2,ri3 = h) 

= k) 


exp (pfcifc 2 fc 3 ) 

exp(ft:^fc) 

Xlfc/ exp(fi;^fc/) 


( 11 ) 

( 12 ) 


where fikik2k3 — J2t=l + 012hl,fcih2,fc2 + (plsVlMVSM + 4>23V2,k2V3,k3- 

This model yields the following set of parameters, corresponding to the observed 
variable intercepts Ttm, trait loadings \tm, method loadings 'jtm, error variances 
as well as the latent variable loglinear intercepts atk, and Ktk and latent loglinear 
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associations (ptt''- 


^ ) {^mk} ) ) 

Furthermore, corresponding to the selected model from our application, we choose 
three categories for the latent trait and two for the latent method variables: 

\vt\ = 3, l^ml = 2. 

To ensure parameter values are realistic, we set them to the maximum-likelihood 
estimates found in our application, and vary the sample size across conditions, n G 
{200,500,1000, 2000}. The results of simulating data from this model and analyzing 
them using the GMTMM model are summarized in Table 

Table [^summarizes the bias, dehned as the difference between the true parameter 
value and the simulation average of the maximum likelihood estimate, as well as 
the ratio between and the ratio between the average simulation standard error and 
standard deviation over replications (“s.e./sd”). 

It can be seen in Table that under all conditions, the bias is small for most pa¬ 
rameters and the estimated standard errors accurately reflect the simulation standard 
deviation. Exceptions to this good performance are the latent variable intercepts (e.g. 
0:21 and Kii) in the condition with the smallest sample size (n = 200). Although 
the bias in this condition is smaller for the other latent intercept parameters, there 
is a clear pattern of overestimating the size of the largest class and underestimating 
that of the other classes. This bias dissappears as the sample size grows larger. The 
other parameters do not appear to show any bias, even at the smallest sample size. 

Table also shows the performance of information-based standard errors as an 
estimate of simulation standard deviation. The standard errors perform well when 
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sample size it at least 500. In the smallest sample size condition, some of the standard 
errors tend to underestimate the simulation standard deviation, which will lead to 
undercoverage of conhdence intervals. 

In summary, while the performance of the maximum-likelihood estimates is gen¬ 
erally good, bias in some of the parameter estimates and many of the standard errors 
occurred when the sample size is small (n = 200). Therefore, we recommend to use 
the GMTMM model with samples of at least 500 cases. 

5. DISCUSSION AND CONCLUSION 

We showed how the quality of survey and administrative data can be evaluated 
using generalized multitrait-multimethod (GMTMM) models. This approach is an 
improvement over existing methods, which assume that either the survey or the 
administrative data are perfect measures. A general framework for data quality 
evaluation was introduced. This framework is more suited than existing MTMM 
approaches to administrative data particularities such as categorical measurement, 
nonlinearities, heterogeneous error processes, and nonnormality. We demonstrated 
the use of GMTMM models by applying them to administrative and survey data on 
income and duration of employment from the German Federal Employment Agency. 
A simulation study demonstrated good properties of the maximum-likelihood esti¬ 
mates for a GMTMM model with moderate sample sizes. 

A clear advantage of our approach is that it allows for the presence of mea¬ 
surement error in both the survey and the administrative register. Furthermore, 
using the administrative register as a second measure in the MTMM design has 
an additional advantage over classical MTMM designs using repeated survey mea¬ 
sures. When repeated survey measures are used, survey respondents must answer 
questions on the same topic twice and may remember their answer, creating depen- 
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dencies that are not modeled (Alwin, 2011), although van Meurs (1995) provided 
some evidence that this might not occur in practice when sufficient time is allowed 
between the repetitions. The problem of memory bias does not occur, however, when 
the measurement methods are administrative and survey data collected separately. 
Therefore, besides allowing for the estimation of measurement error in administra¬ 
tive records, the MTMM design using linked survey-register data is an attractive 
method of estimating measurement error in survey variables. 

Some limitations of our work remain. First, we did not discuss model fit evalua¬ 
tion. However, this issue is not specihc to GMTMM modeling, so that the standard 
machinery available for global and local £t assessment in generalized latent variable 


models can trivially be applied to GMTMM modeling (see, e.g. Skrondal and Rabe- 


Hesketh, 2004 Oberski and Vermunt, 2013 Oberski et ah, 2013). Second, little 


is known about the small sample properties of GMTMM model estimates. While 


simulation results by Scholtus and Bakker (2013) on the linear MTMM model were 
positive, other types of GMTMM models were not evaluated as to their stability and 
robustness. This remains a topic for future research. Finally, in our application on 
German data, unique identihers were available that allowed for close linkage between 
the survey and register. In other applications, however, such identihers may not be 
available for legal reasons or they may not exist. In such cases, linkage error will 
occur as well as measurement error. Incorporating such errors into the GMTMM 
model remains a topic for future study as well. 
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Table 3: Simulation results for a generalized MTMM model, under different sample 
sizes. Shown are the true values of the parameters, the simulation bias, and the 


ratio between the average simulation standard error and standard deviation over 
replications (“s.e./sd”). 


Par . 

True 




Sample 

size n 




200 


500 


1000 

2000 

Bias s . e./sd 

Bias s . e./sd 

Bias 

s . e./sd 

Bias 

s . e./sd 

ail 

0.889 

0.013 

0.956 

- 0.001 

1.002 

- 0.002 

0.968 

- 0.002 

1.013 

012 

0.085 

- 0.009 

1.001 

0.004 

1.088 

0.008 

1.067 

0.004 

0.994 

021 

1.426 

0.074 

0.875 

0.027 

0.964 

0.015 

0.962 

0.013 

0.965 

022 

- 0.305 

- 0.013 

0.943 

- 0.002 

0.999 

- 0.010 

1.020 

- 0.006 

0.985 

O31 

- 0.121 

0.017 

0.996 

- 0.003 

1.040 

- 0.007 

0.960 

- 0.002 

0.955 

O32 

- 0.356 

- 0.007 

0.948 

0.008 

1.015 

0.010 

1.021 

0.006 

1.069 

Kii 

0.058 

0.018 

0.752 

0.005 

0.902 

0.005 

0.920 

0.001 

0.939 

K2I 

- 0.888 

- 0.015 

0.917 

- 0.008 

0.967 

- 0.003 

0.940 

- 0.005 

1.001 

Til 

1.296 

0.001 

0.940 

0.003 

0.963 

- 0.000 

1.042 

- 0.001 

1.013 

^11 

3.772 

- 0.017 

0.815 

- 0.004 

0.917 

- 0.000 

0.948 

0.007 

0.943 

111 

- 1.025 

- 0.007 

1.047 

- 0.003 

1.022 

- 0.004 

1.105 

- 0.002 

0.983 

T2I 

0.693 

- 0.015 

0.943 

- 0.000 

1.049 

0.004 

1.065 

0.003 

1.096 

A21 

1.546 

0.013 

0.956 

- 0.001 

1.005 

- 0.005 

1.010 

0.002 

0.998 

711 

0.043 

0.031 

0.850 

0.008 

0.953 

- 0.000 

0.973 

- 0.003 

0.954 

^31 

0.366 

0.001 

0.870 

0.000 

0.988 

- 0.000 

0.943 

- 0.000 

0.991 

A31 

- 0.283 

- 0.001 

0.931 

- 0.000 

1.090 

0.000 

1.032 

0.000 

1.008 

731 

0.001 

- 0.001 

0.830 

- 0.001 

0.961 

- 0.000 

1.050 

- 0.000 

1.061 

T12 

4.811 

0.004 

1.025 

0.000 

1.015 

0.005 

1.014 

0.004 

0.950 

Ai 2 

2.029 

0.003 

0.929 

- 0.001 

0.988 

- 0.004 

0.992 

- 0.003 

0.987 

712 

- 3.169 

- 0.003 

1.026 

0.002 

1.023 

- 0.001 

1.038 

- 0.002 

0.958 

T22 

1.017 

0.009 

0.915 

0.002 

0.982 

- 0.001 

0.947 

0.002 

0.968 

A22 

1.964 

- 0.003 

0.981 

- 0.001 

1.020 

0.001 

0.960 

0.002 

0.970 

722 

- 0.224 

- 0.002 

0.902 

0.001 

1.019 

0.003 

0.966 

- 0.000 

0.967 

T32 

0.384 

0.001 

0.959 

- 0.000 

0.945 

0.000 

0.968 

0.001 

1.094 

A32 

- 0.114 

- 0.002 

0.971 

- 0.000 

0.943 

- 0.000 

0.961 

- 0.001 

0.998 

732 

- 0.006 

- 0.001 

0.963 

- 0.001 

0.995 

- 0.000 

1.006 

- 0.001 

1.099 

dl2 

2.916 

0.067 

0.882 

0.032 

1.001 

0.020 

0.969 

0.009 

0.986 

4>13 

- 0.992 

- 0.012 

0.895 

- 0.033 

0.950 

- 0.008 

0.912 

- 0.000 

0.997 

4>23 

- 0.289 

0.059 

0.872 

0.020 

0.986 

0.005 

1.016 

0.012 

0.998 

0 'e,ll 

0.175 

0.004 

0.771 

0.001 

0.934 

- 0.001 

1.005 

- 0.001 

0.984 

Oe,21 

0.420 

- 0.017 

0.993 

- 0.007 

0.971 

- 0.004 

1.055 

- 0.003 

1.074 

<Te ,31 

0.003 

- 0.000 

0.891 

- 0.000 

1.031 

- 0.000 

0.932 

- 0.000 

0.941 

Oe,12 

0.545 

- 0.005 

1.043 

- 0.005 

0.931 

- 0.002 

0.940 

- 0.002 

0.980 

Oe,22 

0.141 

- 0.002 

1.067 

0.001 

1.043 

- 0.000 

1.064 

0.000 

0.954 

Of,32 

0.015 

- 0.000 

1.030 

- 0.000 

0.993 

- 0.000 

1.039 

- 0.000 

1.081 
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